Search CORE

23 research outputs found

Finding regulatory DNA motifs using alignment-free evolutionary conservation information

Author: Gordân Raluca
Hartemink Alexander J.
Narlikar Leelavati
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for using conservation information for TF binding site discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It requires neither sequence alignments nor the phylogenetic relationships between the orthologous sequences, yet it is more effective on real biological data than methods that do

CiteSeerX

PubMed Central

A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast

Author: Alexander J Hartemink
Leelavati Narlikar
Raluca Gordân
Satoru Miyano
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Finding functional DNA binding sites of transcription factors (TFs) throughout the genome is a crucial step in understanding transcriptional regulation. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known TF motifs occur in the genome than are actually functional. However, information about chromatin structure may help to identify the functional sites. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling TFs to bind DNA in those regions. Here, we describe a novel motif discovery algorithm that employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy. When a Gibbs sampling algorithm is applied to yeast sequence-sets identified by ChIP-chip, the correct motif is found in 52% more cases with our informative prior than with the commonly used uniform prior. This is the first demonstration that nucleosome occupancy information can be used to improve motif discovery. The improvement is dramatic, even though we are using only a statistical model to predict nucleosome occupancy; we expect our results to improve further as high-resolution genome-wide experimental nucleosome occupancy data becomes increasingly available

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Machine learning prediction of non-attendance to postpartum glucose screening and subsequent risk of type 2 diabetes following gestational diabetes

Author: Ghebremichael-Weldeselassie Yonas
Narlikar Leelavati
Parkhi Durga
Patel Vinod
Periyathambi Nishanthi
Saravanan Ponnusamy
Siddharthan Rahul
Sukumar Nithya
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2022
Field of study

Objective The aim of the present study was to identify the factors associated with non-attendance of immediate postpartum glucose test using a machine learning algorithm following gestational diabetes mellitus (GDM) pregnancy. Method: A retrospective cohort study of all GDM women (n = 607) for postpartum glucose test due between January 2016 and December 2019 at the George Eliot Hospital NHS Trust, UK. Results Sixty-five percent of women attended postpartum glucose test. Type 2 diabetes was diagnosed in 2.8% and 21.6% had persistent dysglycaemia at 6–13 weeks post-delivery. Those who did not attend postpartum glucose test seem to be younger, multiparous, obese, and continued to smoke during pregnancy. They also had higher fasting glucose at antenatal oral glucose tolerance test. Our machine learning algorithm predicted postpartum glucose non-attendance with an area under the receiver operating characteristic curve of 0.72. The model could achieve a sensitivity of 70% with 66% specificity at a risk score threshold of 0.46. A total of 233 (38.4%) women attended subsequent glucose test at least once within the first two years of delivery and 24% had dysglycaemia. Compared to women who attended postpartum glucose test, those who did not attend had higher conversion rate to type 2 diabetes (2.5% vs 11.4%; p = 0.005). Conclusion Postpartum screening following GDM is still poor. Women who did not attend postpartum screening appear to have higher metabolic risk and higher conversion to type 2 diabetes by two years post-delivery. Machine learning model can predict women who are unlikely to attend postpartum glucose test using simple antenatal factors. Enhanced, personalised education of these women may improve postpartum glucose screening

Open Research Online (The Open University)

PubMed Central

Warwick Research Archives Portal Repository

Towards a Complete Transcriptional Regulatory Code: Improved Motif Discovery Using Informative Priors

Author: Narlikar Leelavati
Publication venue
Publication date
Field of study

Transcriptional regulation is the primary mechanism employed by the cell to ensure coordinated expression of its numerous genes. A key component of this process is the binding of proteins called transcription factors (TFs) to corresponding regulatory sites on the DNA. Understanding where exactly these TFs bind, under what conditions they are active, and which genes they regulate is all part of deciphering the transcriptional regulatory code. An important step towards solving this problem is the identification of DNA binding specificities, represented as motifs, for all TFs. In spite of an explosion of TF binding data from high-throughput technologies, the problem of motif discovery remains unsolved, due to the short length and degeneracy of binding sites. We introduce PRIORITY, a Gibbs sampling-based approach, which incorporates informative positional priors into a probabilistic framework, to find significant motifs from high-throughput TF binding data. We use different data sources to build our positional priors and apply them to yeast ChIP-chip data: * TFs can be classified into several structural classes based on their DNA-binding domains. Using a Bayesian learning algorithm, we show that it is possible to predict the class of a TF with remarkable accuracy, using information solely from its DNA binding sites. We further incorporate these results in the form of informative priors into PRIORITY, which learns the structural class of the TF in addition to its motif. * In the nucleus, DNA is present in the form of chromatin--wrapped around nucleosomes--with certain regions being more accessible to TFs than others. It has been shown that functional binding sites are generally located in nucleosome-free regions. We use nucleosome occupancy predictions to compute a novel positional prior that biases the search towards the more accessible regions, thereby enriching the motif signal.* Functional elements are often conserved across related species. Most conventional methods that exploit this fact use alignments. However, multiple alignments cannot always capture relocation and reversed orientation of binding sites across species. We propose a new alignment-free technique that not only accounts for these transformations, but is much faster than conventional methods. All our priors significantly outperform conventional methods, finding motifs matching literature for 52 TFs. We produce a genome-wide map of TF binding sites in yeast based on these and other novel motif predictions.Dissertatio

DukeSpace

No Promoter Left Behind (NPLB): learn de novo

Author: Leelavati Narlikar
Sneha Mitra
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

DIVERSITY finds multiple modes in fly CTCF ChIP data.

Author: Anushua Biswas (5132405)
Leelavati Narlikar (81684)
Sneha Mitra (5132402)
Publication venue
Publication date
Field of study

(a) 200bp regions centered around the summit of ChIP peaks, input to diversity. (b) Diversity reorders and realigns the data, revealing eight modes. (c) Motifs corresponding to modes. CTCF motifs from JASPAR and from high throughput SELEX [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1006090#pcbi.1006090.ref025" target="_blank">25</a>] are shown below. (d) Sequence conservation profile from phastCons, corresponding to nucleotides in b (e) The eight modes are displayed in decreasing ChIP score. (f) Violin plot of distance of each sequence from the closest transcription start site. (g) Violin plot of expression values of genes (log2(1+RPKM)) with TSS within 2000bp of the ChIP region. Red line shows the median value across all measured genes. (h) Overlaps with Su(Hw) and Pita ChIP experiments, respectively.</p

FigShare